What Makes for an Effective Data Practitioner in 2024

Marck Vaisman
Senior Technical Specialist, Microsoft
Adjunct Professor, Data Science, Georgetown University

2024-03-26

In 2010, Drew Conway gave us this - our somewhat reductionist definition of a data unicorn

The OG Data Science Cheat Sheet

Analyzing the Analyzers, 2012-2013

Skills distilled from our survey

  • Algorithms (ex: computational complexity, CS theory)
  • Back-End Programming (ex: JAVA/Rails/Objective C)
  • Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS)
  • Big and Distributed Data (ex: Hadoop, Map/Reduce)
  • Business (ex: management, business development, budgeting)
  • Classical Statistics (ex: general linear model, ANOVA)
  • Data Manipulation (ex: regexes, R, SAS, web scraping)
  • Front-End Programming (ex: JavaScript, HTML, CSS)
  • Graphical Models (ex: social networks, Bayes networks)
  • Machine Learning (ex: decision trees, neural nets, SVM, clustering)
  • Math (ex: linear algebra, real analysis, calculus)
  • Optimization (ex: linear, integer, convex, global)
  • Product Development (ex: design, project management)
  • Science (ex: experimental design, technical writing/publishing)
  • Simulation (ex: discrete, agent-based, continuous)
  • Spatial Statistics (ex: geographic covariates, GIS)
  • Structured Data (ex: SQL, JSON, XML)
  • Surveys and Marketing (ex: multinomial modeling)
  • Systems Administration (ex: *nix, DBA, cloud tech.)
  • Temporal Statistics (ex: forecasting, time-series analysis)
  • Unstructured Data (ex: noSQL, text mining)
  • Visualization (ex: statistical graphics, mapping, web-based dataviz)

Evolution of Data Science

timeline graphic (to-be-created)

In 2024, the hard truth: an overloaded definition and set of expectations leading to the Data Practitioner Soup

SQL, Python, and a dash or dark magic

How it started, how it’s going

The complexities of defining data science

(expand on this)

  • science
  • research paradigm
  • research method
  • discipline
  • workflow
  • profession

Mastering the broadness is extremely difficult and needs time

And teaching is even harder!

Teaching stories

WWDSD?

Build a tool to help process all this information

Build a RAG

Academic models of skills and competencies for Data Science

  • edison
  • iadss

We need to be careful, though

Propsed model

scaffolding graphic

Skills continuuum slide 1

still in development

Skills continuuum slide 2

still in development

Skills continuuum slide 3

still in development

Skills continuuum slide 4

still in development

Skills continuuum slide 5

still in development

Skills continuuum slide 6

still in development

Call to action

What are you going to do to help fix this?